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METHOD OF EVALUATING A RECURSIVE QUERY OF A DATABASE 

CROSS-REFERENCE TO RELATED APPLICATION 

This patent application is a continuation of patent application serial number 
07/487,346, filed March 1, 1990, now abandoned, which in turn is a continuation-in-part 
of copending patent application serial number 07/286,425, filed December 19, 1988, and 
assigned to the same assignee as the present application. 

BACKGROUND OF THE INVENTION 

The present invention relates generally to database systems and more particularly 
to a method of evaluating a recursive query of a database. 

Database systems are being used to store and manage more and more different 
kinds of data. As the use of database systems has expanded and the quantities of data 
stored in a database have increased, much effort has been devoted to improving existing 
database systems and developing new systems with new and better capabilities. 

Relational Database Concepts 

Data in a relational database are perceived by the user as being arranged in tables. 
Each table may be thought of as specifying a "relation" among the data in that table; 
therefore, each table is referred to as a "relation". Each row in a relation may be thought 
of as one data record. The rows are referred to as "tuples". 

To "query" a database means to request information from it. To "evaluate" a 
query means to obtain the requested information. Sometimes the requested information 
can be obtained directly by looking it up in one of the relations. If the requested 
information does not appear in any of the relations, it must be derived, for example by 
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comparing two or more tuples in a relation or by combining tuples from two different 
relations. Hie following four examples will help to illustrate these concepts. 

EXAMPLE 1 

Consider an historical database that contains data about people. Some of the data 
are arranged in a table or "relation" called PARENT. The data in the PARENT relation 
are arranged in the form 

PARENT (NAME, PNAME) 

where pname is the name of a parent of a person named name. For example, "PARENT 
(Andrew. William)" represents a tuple which says that William is a parent of Andrew. 

Others of the data indicate who is a friend of whom; these data are arranged in a 
FRIEND relation in the form 

FRIEND (NAME, FNAME) 

where fname is the name of a friend of a person named name. 

Still others of the data are in a PERSON relation and are arranged in the form 

PERSON (name, sex) 
where name is the name of a person and SEX is the sex of that person. Representative 
PARENT, FRIEND and PERSON relations are depicted in Tables I through m, 
respectively. 



| PARENT relation 

I I 

| NAME | FNAME 

j (Name of Person) j (Name of Parent) 

I I 

| Andrew | William 

j Andrew | Mary 

| Mary j John 

I Mary | Anne 

j John j Richard 

j John | Wflma 



TAB££I 
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FRIEND relation 



| NAME 



| FNAME 



| Andrew 
| Mary 
| William 



| Michael 
| Joan 
| Robert 



TABLE II 



PERSON relation 



| NAME 



| SEX 



| Andrew 
| William 
I Mary 



I M 
I M 
I F 



TABLE m 



A quay of the form "FIND PARENTS OF [XJ* is an example of a request for 
information that is directly obtainable by looking up data in this particular database. The 
response to such a query would be the names of the parents of X. 

A quay of the form "FIND MOTHER OF [X]" is an example of a request for 
information that cannot be obtained by looking it up. This is because the PARENT 
relation does not include the sex of the parents and, unlike a human, a computer does not 
know, for example, that "William" would ordinarily be a father and "Mary" a mother. 
Therefore, the requested information must be derived from information in the database, 
for example by (1) finding the parents of X in the PARENT relation, (2) finding the sex 
of each of those parents in the PERSON relation, and (3) selecting the female parent. 
The response to such a query would be the name of the mother of X. 



Consider a factory database for keeping track of parts that are used to make 
engines. The data are arranged in two relations. The first relation, SUPPLIER, contains 



EXAMPLE 2 



dU3ST!TUTE SKEE 



I 
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data arranged in the form 

SUPPLIER (PAST, SUPPLIER, CTTY) 

where PART represents the name (or part number or other convenient identifier) of a 
particular part, supplier represents a name of a source of that part, and city represents 
the location of that supplier. A few typical entries might be 

SUPPLIER (Needle Valve, Bill's Brass Wades, Buffalo) 
SUPPLIER (Needle Valve, Paul's Plumbing Mfg. Co., Pittsburgh) 
SUPPLIER (1/4" Screw, Mac's Machine Stop, Milwaukee) 
SUPPLIER (Short Spring, Sam's Spring Specialties, Springfield) 

and so on. The second relation, SUBPART, contains data in the form 

SUBPART (SPART, SSUBPART, SQTY) 

whore spart represents the name of a part that requires a subpart, ssubpart represents 
the name of that subpart, and sqty represents how many of that subpart are used in that 
part. Some examples are 

SUBPART (Carburetor, Valve Assembly, 2) 
SUBPART (Carburetor, 1/4' Screw, 16) 
SUBPART (Carburetor, Short Spring, 3) 
SUBPART (Valve Assembly, Needle Valve, l) 
SUBPART (Valve Assembly, 1/4* Screw, 4) 

indicating that each carburetor requires two valve, assemblies, sixteen 1/4" screws, three 
short springs, and so on. 

A query of the form "FIND SUPPLIERS OF [X]" is an example of a request for 
information that is directly obtainable by retrieving data from this particular database. 
The response to such a query would be die names of all suppliers of part X. 

A query of the form "HOW MANY SUPPLIERS SUPPLY \X}" is an example of 
a request for information that is not directly obtainable but that can be derived from 
information in die database, for example by finding all suppliers of X and then counting 
how many supplier names are found. The response to such a query would be the number 
of suppliers that supply part X. 
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EXAMPLE 3 



Consider an airline reservation system. Data in this system are arranged in a 
FLIGHT relation in the form 

FLIGHT (FROM, TO, DISTANCE, DTIME, ATTME, COST) 

where from is the name of a departure city, to is the name of an arrival city, distance 
is the mileage between those two cities, DTIME is the time of departure, ATTME is die 
time of arrival, and cost is the cost of a ticket on that flight TABLE IV presents a 
typical FLIGHT relation. 



I 
I 

| FROM 
I 

| Btubank 
| Bnibank 
| Burbank 
| Burbank 
| Reno 



| TO 
I 

| Reno 
I Reno 
| Reno 
| New York 
| New York 



FLIGHT relation 

I I 

| DISTANCE | DTIME 



I 

|350 
| 350 
|350 
| 2500 
| 2400 



I 

| 7:30 AM 
| 9:30 AM 
| 11:30 PM 
| 8:00 AM 
j 11:15 AM 



| ATTME 
I 

| 8:40 AM 
| 10:40 AM 
| 1230 AM 
| 4:00 PM 
| 730 PM 



COST 

250 
250 
150 
650 
595 



TABLE IV 



A query of the form "FIND CHEAPEST FLIGHT BETWEEN [X] and [Y]" is an 
example of a request for information that cannot be obtained merely by retrieving data 
from this database but that can be derived from information in the d at ab a se, for example 
by finding the fares of all flights between city X and city Y and then comparing the 
various fares to find which is lowest The response to such a query would be the flight 
number of the cheapest flight between X and Y. 

EXAMPLE 4 

Consider a study of the spread of a sexually transmitted virus. One of the 
questions under investigation is the spread of the virus by heterosexual transmission. 
Records have been compiled of all heterosexual encounters in a defined population. 
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These records are arranged in an epidemic-study database in an ENCOUNTER relation in 
die form 

ENCOUNTER (MALE* FEMALE, DATE, CUT, COUNTY). 
Additional data respecting individuals in the population under study are arranged in a 
PERSON relation: 

PERSON (PERSON, AGE, VACCINATED) 

where vaccinated is a yes-or-no entry indicating whether the person has been vaccinated 
against the virus. Representative ENCOUNTER and PERSON relations are depicted in 
Tables V and VI, respectively. 



1 




ENCOUNTER i 


elation (E) 




! el 


1 e2 


1 e3 


1 e4 


I *5 


| MALE 


j FEMALE 


| DATE 


| CITY 


| COUNTY 


1 


1 


1 


1 




| Andrew 


| Susan 


| 12-2-88 


| Detroit 




! Andrew 


IMaiy 


| 9-6-88 


{New York 




| John 


| Susan 


| 11-19-88 


I Boston 




| Richard 


{ Susan 


| 2-14-89 


| Boston 




[Joe 


| Anne 


| 3-25-89 


| New York 





TABLE V 



I Pi 

j PERSON 
I 

j Andrew 
| John 
| Richard 
| Joe 
| Susan 
[Mary 
j Anne 



PERSON 
|p2 
| AGE 

I 

[25 
|18 
123 
j 40 
[22 
| 29 
117 



relation (P) 

VACCINATED 

YES 
NO 



YES 
NO 
YES 
NO 



TABLE VI 



A query that requests the names of all women who had encounters with a certain 
man is an example of a request for information that can be retrieved directly. A query 
that seeks the names of all vaccinated women who had encounters with a certain man is 
an example of a request for information that can be derived by (1) retrieving the names of 
all women who had encounters with that man according to the ENCOUNTER relation and 
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(2) checking the names of each of those women according to the PERSON relation to find 
out which of them have been vaccinated. 

Relational Algebra 

As the volume of data in a database grows larger and the nature of the relations 
grows more complex, evaluating complicated queries becomes more difficult and time- 
consuming. Simplifying the task of evaluating a given query is known as "optimizing" 
the quay. There may be only a few, or many hundreds, of ways to evaluate a 
complicated query, all of which provide the same answer but some of which are much 
more efficient than others. "Optimizing" such a query may mean choosing the best of all 
possible ways of evaluating the query, but usually "optimizing" means finding a 
reasonable number of ways according to techniques that are known to increase 
computational efficiency and choosing the best of those. 

A set of relational operators collectively referred to as "relational algebra" has 
been developed for use in optimizing the evaluation of a complicated query in a large 
database. A quay is translated into a relational algebra expression; the expression is 
simplified according to certain procedures; quay plans for evaluating the simplified 
expression are generated; and the most efficient of these plans is selected and carried out 
to provide the desired response. See generally C. J. Date, An Introduction to Database 
Systems (4th Ed.) Vol. I, Addison-Wesley 1986, chapters 13 and 16, and references cited 
therein. 

The relational algebra includes a set of operators. These operators can be 
compared with arithmetic operators such as "+" and "-£-". Just as an arithmetic operator 
operates on one or two "input" numbers and provides a new "output" number, so each 
relational algebra operator takes one or two relations as "inputs" and provides a new 
relation as "output". A few ©camples of these operators are the SELECT, PROJECT, 
UNION, INTERSECTION, and JOIN operators. 
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Hie SELECT operator obtains specified rows from a relation. For example, a 
query seeking the names of Andrew's parents could be phrased as "SELECT entries 
respecting Andrew from the PARENT relation" (see Example 1 above). This SELECT 
operation would be represented in die relational algebra as 

In response, the SELECT operator would provide the following new relation: 

Andrew William 
Andrew Mary 

from the PARENT relation. 

The PROJECT operator obtains specified columns from a relation. For example, 
a query seeking the names of all persons who have children could be phrased as 
"PROJECT pname entries firom the PARENT relation". This PROJECT operation would 
be represented in the relational algebra as 

Tp^PAREffT) 
and would provide the following new relation: 

William 

Mary * 

John 

Anne 

Kicnara 

Wilma 

firom the PARENT relation. 

The UNION operator collects all the rows of each of two relations. The UNION 
operation is represented in relational algebra as 
A U B 

where A and B are relations. For example, PARENT U FRIEND in the historical 
database would provide a new relation containing parent-names and friend-names of 
everyone who has either a parent or a friend or both. In providing die new relation, the 
name entries in die PARENT relation would be correlated with the NAME entries in the 
FRIEND relation; for example, the name Andrew in the PARENT relation would be 
considered to refer to the same person as the name Andrew in die FRIEND relation. 
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The INTERSECTION operator collects common rows of each of two relations. 
The INTERSECTION operation is represented in relational algebra as 
Ann. 

For example, PARENT n FRIEND in the historical database would provide a new 
relation containing parent-names and friend-names of those persons who have both a 
parent and a friend. 

The JOIN operator combines rows from each of two relations according to a 
specified condition. In relational algebra the JOIN operation is represented as 

A **conditton B 

For example, the join operation E ^female=persm p m endemic-study database of 
Example 4 above would provide a new relation by combining the information in each 
tuple of the ENCOUNTER relation with the information in that tuple of the PERSON 
relation having an entry undo* person that matches the entry under female in the tuple 
of the ENCOUNTER relation. 

Recursive Queries 

A kind of database query which has grown more important in recent years is a 
recursive query. Such a query can be described as a query which queries itself. A 
recursive query can be evaluated only by deriving information recursively. A general 
discussion of the mathematical concept of recursion can be found in J. Bradley, 
Introduction to Discrete Mathematics, ch. 6, Addison-Wesley 1988; see also E. Robots, 
Thinking Recursively, John Wiley 1986. 

As a simple example of a recursive query, consider a quay of the form "FIND 
GRANDPARENTS OF [XT directed to the historical database of example 1 above. This 
da t abase contains no information about grandparents. However, the requested 
information can be recursively derived from the information in the database, for example 
by a query of the form "FIND PARENTS OF [FIND PARENTS OF [X]]". 
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If the number of iterations required to evaluate a recursive quay is known in 
advance, then the evaluation process is relatively straight-forward. For example, the 
request to find the grandparents of X requires exactly two iterations - one to find the 
parents of X and one to find the parents of the parents of X. However, if the numbs of 
iterations is not known, thai the evaluation becomes fer more difficult; an example of a 
request in which die number of iterations is not known is a request to find all ancestors 
ofX. 

As the volume of data in a rfafrfrggft grows larger and the nature of the relations 
expressed by the data grows more complex, die time required for even a very powerful 
computer to respond to a complicated recursive query can become unacceptably long, 
especially when the number of iterations required to derive the response is not known in 
advance. Accordingly, die efficient evaluation of recursive queries has become a matter 
of critical importance in die design of modern database systems. A comprehensive survey 
of this problem is presented by F. Bancilhon and R. Ramakrishnan in "An Amateur's 
introduction to Recursive Quay Processing Strategies" in the Proceeding? of die ACM- 
SIGMOD Coherence, Washington, D.C., May 1986. 

The relational algebra does not have recursion operators and hence cannot support 
recursive queries. Some relatively simple recursive queries can be expressed in transitive 
closure form, and transitive closure operators have been proposed for use in tra nsl at ing 
such queries into relational algebra expressions (R. Agrawal, "Alpha: An Extension of 
Relational Algebra to Express a Class of Recursive Queries", Proceedings of the Third 
International Corference on Data Engineering, Los Angeles, California, February 3-5, 
1987; S. Ceri et aL, Translation and Optimization of Logic Queries: the Algebraic 
Approach", Proceeding? of the Eleventh International Corference on Very Large Data 
Bases, Kyoto, Japan, August 1986). However, not all recursive queries can be expressed 
in transitive closure form. 
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From the foregoing, it will be apparent that there is a need for a way to optimize 
recursive queries, especially those which cannot be expressed in transitive closure form, 
for efficient evaluation in large and complex database systems. 

SUMMARY OF THE INVENTION 

The present invention provides a method of evaluating a recursive query in a 
database system by translating the query into an expression that includes a novel fixpoint 
operator and using a novel set of transformation procedures to simplify the translated 
quay. 

Briefly and in general terms, a method of evaluating a recursive query includes the 
steps of translating the query into a relational algebra expression that includes a fixpoint 
operator, optimizing the expression according to a set of transformation procedures, and 
evaluating the optimized expression by reference to data in the database. The 
transformation procedures include commuting a projection operation with a fixpoint 
operation, commuting a selection operation with a fixpoint operation, distributing a join 
operation over a fixpoint operation, and regrouping a join operation and a fixpoint 
operation. 

Regrouping means applying the commutation and association rules, typically to an 
expression having a fixpoint and several join operators. The selection operation may be a 
selection predicate on a direct mapping column, a global selection predicate, or a 
selection predicate that includes a join operation. 

Other aspects and advantages of the present invention will become apparent from 
the following detailed description, taken in conjunction with die accompanying drawings, 
illustrating by way of example the principles of the invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a data flow diagram depicting a preferred embodiment of a method 
of optimizing recursive queries according to the invention; 

FIG. 2 is a flow diagram depicting initial and recursive inputs of a fixpoint 
operator as referenced in the "translate using ®° process of FIG. 1; and 

FIG. 3 is a flow diagram depicting a generalized version of the fixpoint operator 
shown in FIG. 2. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

As shown in the drawings for purposes of illustration, the invention is embodied in 
a novel method of evaluating recursive queries in a database system. R el ati on a l algebra 
provides a powerful technique for optimizing the evaluation of database queries, but 
recursive queries have not been amenable to such optimization techniques. 

In accordance with die invention, a novel fixpoint operator and new transformation 
procedures are provided for translating a recursive query into a relational algebra 
expression and then simplifying that expression. In the form of this simplified 
expression, the query can be evaluated much more efficiently than would otherwise be 
possible. 

As shown in state diagram form in FIGURE 1, a method of evaluating a recursive 
query of a database 11 comprises translating a recursive query into an expression that 
includes a fixpoint operator, as indicated by a "translate using ®" process circle 13; 
optimizing the expression according to a set of transformation procedures as indicated by 
an "optimize" process circle 15; and evaluating the optimized expression, as indicated by 
an "evaluate" process circle 17, by reference to data in the database 11. 

The recursive query is received from a user as indicated by an input box 19 and 
an arrow extending from the box 19 to the "translate" circle 13. The "user" may be a 
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person at a compute- terminal, but the user could also be, for example, an electronic 
device, an application program, or the like. Similarly, the result of evaluating the 
optimized query is provided to the user as indicated by an output box 21 and an arrow 
extending from the "evaluate* circle 17 to the box 21. The user who receives the output 
is usually the same user as the one who generated the query, but this need not be the 
case; the user that generates a query could specify that the result be sent somewhere else. 

Several novel transformation procedures are provided according to the invention. 
These new transformation procedures include commuting a projection operation with a 
fixpoint operation as indicated by a "commute projection* process circle 25, commuting a 
selection operation with a fixpoint operation as indicated by a "commute selection" 
process circle 27, distributing a join operation over a fixpoint operation as indicated by a 
"distribute join" process circle 29, and regrouping join and fixpoint operations as 
indicated by a "regroup" process circle 31. 

Arrows extend in both directions between the "optimize" process circle IS and 
each of the process circles 25, 27, 29 and 31. These arrows indicate that application of 
one transformation procedure may result in an expression that requires application of 
another transformation procedure and that the optimization process may require one or 
more than one application of any given transformation procedure during the course of 
simplifying a given expression. Thus, depending on the characteristics of the expression 
being optimized, various ones of the transformation procedures may be used at various 
times during the optimization. Of course, some of the procedures may not be used at all 
in a given case. 

Commuting a selection operation with a fixpoint operation comprises commuting a 
selection predicate on a direct mapping column with a fixpoint operation as indicated by a 
"direct map" process circle 33, commuting a global selection predicate with a fixpoint 
operation as indicated by a "global" process circle 35, and commuting a selection 
predicate that includes a join with a fixpoint operation as indicated by a "join" process 
circle 37. Arrows extend in both directions between the "commute selection" circle 27 
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and each of the circles 33, 35 and 37 to indicate that one or more than one of the 
procedures respecting commuting a selection and a fixpoint operator may be used as 
needed; of course, none of die "commute selection" procedures may be required in some 
cases* 

Regrouping a join operation and a fixpoint operation comprises commutation and 
association as indicated by a "commutation" process circle 39 and an "association" 
process circle 41, respectively. Arrows extending in both directions between the 
"regroup" process circle 31 and each of the circles 39 and 41 indicate that one or more of 
the commutation and association procedures may be used. Again, some cases might not 
require either procedure. 

In addition to die novel transformation procedures provided by the invention, a 
number of transformation procedures are already known in die relational algebra. 
Various ones of these may also be used to simplify a recursive query as indicated by a 
"transform procedures" process circle 43. Arrows extending in both directions between 
the circle 43 and die circle 15 indicate that these previously-known transformation 
procedures may be used once, several times or not at all in any given optimization. 

The fixpoint operator according to die invention enhances the declarative power of 
relational algebra by s u pp orti ng recursive queries. It is expected that die introduction of 
the fixpoint operator will benefit many database and computer applications such as 
computer aided design and manufacture (CAD/CAM), software engineering (CASE), and 
artificial intelligence (A3) applications. 

As with other relational operators, the inputs (operands) and output of die fixpoint 
operator are relations. The fixpoint operator supports least fixed point semantics. Hie 
fixpoint operator can compute both linear and mutually recursive relations. 
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In a simple form the fixpcrint operator is defined symbolically by an expression of 
the form 

(Ii-Im) ®CE:Oe( R i- R n»Rc) 

where 

I j represents the /-th initial input, 
CE represents a Condition Expression, 
OE represents an Output Expression, and 
Rj represents they-th recursive input 

This form of the fixpoint operator is depicted diagrammatically in FIG. 2. Each initial 

input I is a relation. There may be one or more such inputs; M initial inputs are shown 

in FIG. 2. Each recursive input R is also a relation. There may be none, one or more of 

these recursive inputs; N recursive inputs are shown in FIG. 2. A recursive input may, 

but need not, be the same as an initial input. The output of the fixpoint operator is fed 

bade as a recursive input R c ; this recursive input R c differs from the N other recursive 

inputs in that the input R c is derived by the fixpoint operator whereas the other recursive 

inputs are not 

More particularly, during a first iteration the initial inputs are utilized to provide a 
first output This output is fed back as the recursive input R c . Hie recursive input R c 
and die N other recursive inputs are utilized to provide a second output, and so on for as 
many iterations as are required. 

If an output is not fed back as one of the recursive inputs, the fixpoint operator 
simplifies to conventional join and union operations. 

In general, equality predicates in die condition expression are treated as join 
columns during processing of the ® operator while any other predicates in the condition 
expression represent additional conditions which the tuples of the recursively derived 
relation generated by die ® operator must satisfy. 
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A more general form of the fixpoint operator, corresponding to a plurality of K 
mutually recursive relations, is defined symbolically by an expression of the form 

®CE,K)E l CEjtOEj ( R i ~ > Rci -R<x)- 
This form of the fixpoint operator is depicted diagrammatically in FIG. 3. Each initial 
input lis a relation. There may be one or more such inputs; M initial inputs are shown 
in FIG. 3. Each recursive input R is also a relation. There may be none, one or more of 
these recursive inputs; iV recursive inputs are shown in FIG. 2. A recursive input may, 
but need not, be the same as an initial input The fixpoint operator provides K recursive 
relations as outputs and these are fed bade as recursive inputs R a through Rex ; these K 
recursive inputs R c differ from the N other recursive inputs in that die inputs R c are 
derived by the fixpoint operator whereas the other recursive inputs are not. In general, / 
may but need not be equal to K. 

Hie sets of initial or recursive inputs for each recursive relation need not be 
disjoint. For mutually recursive relations the recursive inputs cannot be disjoint because 
if they were the relations would not be mutually recursive. 

The ultimate output of the ® operator is a single relation representing the 
Cartesion product of die K mutually recursive output relations. Additional relational 
operators can be used to extract individual relations from die output 

The invention will now be more formally described in mathematical terms. The 
following examples will show how the fixpoint operator can be freely intermixed with 
other relational algebra operators to pose powerful queries. 

The historical "people" database as described in Example 1 above includes the 
base relations PARENT, FRIEND and PERSON as given in Tables I through HL A 
derived relation ANCESTOR, of the form 

ANCESTOR (DNAME, ANAME) 

where dname is the name of a person and aname is die name of an ancestor of that ( 
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person, is defined as 

PARENT ® name=aname: dnme> ^ {PARENT ANCESTOR) . 
Hie fixpoint operator may then be used to express various recursive queries of the 
"people" database. Specifically, a query of the form "Find all ancestors of John" is 
expressed: 

' jr aname(. ff dname= "John" ( PARE NT ®name=aname: dname, pname 
(PARENT ANCESTOR ))) 

A query of the form "Find all people who arc either friends of ancestors of John 
or friends of John" is expressed: 

*Jhame(°dname= "John" « PARE NT ® mme=aname: dname> 
{PARENTyANCESTOR )) ^ mmgmmm FRIEND )) U t^( W= w 
(FRIEND)) 



A query of the form "Find the friends of John's ancestors" is expressed 

T Jhame^dname= "John" « PARENT ®name=aname: dname, pname 
(PARENT ANCESTOR )) "a^^^ FRIEND )) 

The factory database of Example 2 above includes the base relations SUPPLIER 
and SUBPART. A derived relation COMP, of the form 

COMP (CPART, CSUBPART, CQTY), 

is defined as 

SUBPART <S> spm=csubpart: ssiatpan cqfymsqty ( SUBPART, COMP ). 
The fixpoint operator may then be used to express various recursive queries such as a 
query of the form "Find the location and quantities of any parts mat go into making a 
Locomotive" as follows: 

7 csubpart.city^m(cqty) {SUPPLY ~ parr = autpan (GROUPJJY 

csubpart:aubpartjum(cqty) ( a q>art= "Locomotive * ( SUBPAR T ® spart=csubpart: cpart, 
ssubpaftyCqiy^sgtyiSUBPART.COMP))))) 
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wherein the subscripts in the GROUP BY operator indicate the groupj>y column and 
output columns of the GROUPBY operator. 

Similarly, in the airline reservation database of Example 3, a derived relation 
SHORT_CONNECITON of the farm 

SHOKT_CONNECI10N(ST^ 
is defined as 

FLIGHT ® destmation=fivm. catime < dtime, distance < 1000: start, to, cdtime, atime, 
rmfM +cost {FUGHT f SHORT JOONNECHON) 

The fixpoint operator is used, far example, to express a query of die form "Find 
the fnitifmntn cost of all flights between London and San Francisco under the condition 
that the distance between each pair of connecting points is less than 1,000 miles" as 
fallows: 

Min cost ( a start="SF", destination^ "London" ^distance < 1000<< FLIGBT ) 
(9) 

^ destination- from, catime < dtime, distance < 1000: start, to, cdtime, atime, 
ccost + aw/ mGHT,SHOBT_CONNECnON) 

A main task to be performed by a query optimizer is to rearrange the sequence of 
operations in an expression of a query for more efficient evaluation. Starting with an 
initial farm generated by a parser, the query expression usually undergoes a sequence of 
transformations based upon certain heuristic rules or execution cost comparisons. The 
transformations usually include: 

Performing selections and projections as early as possible, 

Combining sequences of the same operation (e.g. selection or projection) into one 
operation, 

Commuting selection or projection operations with join or Cartesian product 
operations, 

Commuting join (or Cartesian product) operations, and 
Re-associating join (or Cartesian product) operations. 
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(See, generally, UUman, J M Principles of Database Systems (2nd ed.)> Computer Science 
Press, Maryland, 1982 for more discussion of transformations). 

These transformations are made possible by properties (such as commutativity and 
associativity) which are inherent in these operators. In order to avoid the development of 
a new paradigm for dealing with recursive queries, the present invention stays within the 
realm of relational algebra so that existing efficient implementation algorithms and other 
useful techniques (e.g. optimal plan search mechanisms) can be employed. However, not 
all of the common algebraic properties are held between fixpoint operators and other 
relational algebra operators. To integrate the fixpoint operator into an existing query 
optimization process, the valid transformation rules in the extended relational algebra 
have to be identified. 

Example 4 above (the study of the sexually transmitted virus) will be used to 
illustrate these rules. 

Assume that one of the questions under investigation is the spread of the virus by 
heterosexual transmission. To this end, a record has been kept of all heterosexual 
encounters in a population under study. The ENCNTR relation is used to store these 
records. The relation contains additional information about each person in the population 
understudy. Now consider the two derived relations EXPSDFML (male, female, date, 
oty, county) and EXPSDML (female, male, date, ctty, county). The relation 
EXPSDFML is defined as: 

M([ *el. el. e3, e4, e5 E ^ [ v e2. e3. e4. e5 E D 

®\f2=ml.J3<m3, m2=el, m3<e3:fl, e2, e3, e4, e5] t \m2=fl. m3<J3,J2=e2,fi<e3 

ml, el. e3. e4. e5] (( E 9 FM ), {E.FM ))) 
The projection is needed since the output of the fixpoint operator is, by definition, the 
cross product of relations F and M. There is no need to compute the cross product of F 
and M in such a case, and the query processor will detect that Similarly, the relation 
EXPSDML is defined as: 

*>«t **el. el, e3. e4. e5 E 1> f T e2, el. e3. e4. e5 E D 
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®\f2=ml,fl<m3, m2=el, m3<e3 :fl, e2, e3, e4. e5J, [m2=fl. m3<J3.J2=e2.J3<e3 
: nd. el, e3, e4, eSl^M), (EJM))) 

Tables V and VI above illustrate the ENCOUNTER(E) and PERSON(P) relations. 
The EXPSDFML(F) AND EXPSDML(M) relations are illustrated in the following tables 
VIE and Vm. 



| EXPSDFML relation (F) | 

I fl | £2 |D I » I O I 

| MALE | FEMALE | DATE | CITY | COUNTY | 

|- |- |- I- I- I 

TABLE VH 



| EXPSDML relation (M) | 

j ml | m2 | m3 ] m4 ] m5 | 

| FEMALE | MALE | DATE | CITY | COUNTY | 

I — I- I- I— I" I 

TABLE Vm 



Now, consider query Ql which finds all vaccinated females who might have been 
exposed to the virus either directly or indirectly through make carrier X and such that all 
encounters leading to each female have taken place in New York. The query is expresses 
in the extended relational algebra as follows: 

Q fl="X m , Gr/^TWT, G: m4= "NT, p3=tme ( V ~pl=fl ^ r el, e2, e3, e4, e5 
^ T «2, eZ, e3. e4, e5 E ^ ®\fl=ml 9 fl<m3. m2=el, m3<e3 :fl, e2, e3. e4, e5\, 
[m2=fl 9 m3<fl,J2=e2, J3<e3 : ml, el. e3. e4, e5\ <( E ^ >> < E ' F ^ )»» 

Note that a "G" tag attached to a selection condition indicates that it is a global 
one. A selection predicate is said to be global if it is applied at each iteration during the 
generation of the recursive relation defined by die fixpoint operation. In other words, 
once a tuple of an input relation fails to satisfy a global selection predicate, it will be 
excluded from consideration in any of the subsequent recursive computations. (The 
detection of global predicates will be discussed lata:.) 
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In general, for queries on recursive relations, a detailed semantic analysis is 
required to determine the legal transformations that can be applied to an algebraic 
expression. The present discussion is limited to transformation rules which do not require 
any semantic query analysis. Formal proofs of the rules are omitted because of their 
length and instead each rule is motivated either with a sketch of a proof or with detailed 
examples. 

1. Commuting projection operation with fixpoint operation 

Rule 1: Let IINP t and RINP i be the initial and recursive input expressions for 
mutually recursive relation J?j of the fixpoint operation. PE t denotes the set of attributes 
of in the projection operation. SE f denotes the set of attributes of Rf in the selection 
operation. CE t and OE i denote the sets of attributes in the condition expression and 
output expression of R { in the fixpoint operation. AQINPJ denotes the set of attributes in 
the initial input expression and AfRINPj) the set of attributes in the recursive input 
expression for relation Rj. Then 

*m PE n SE„ P»i ™>» > ® CEj : OEj CE„ : OE„ 

(RINPl, .... XNP n )))m t PEi pe^sEj SE n *™>1 . *FE>„ 

M^n) ® CEj : OE'j , .... CE n : OE' n MNPj, .... r^RlNP^)) 

where 

pe\ = (jpe- u SE t u (un =] cEj)) n a ( mtP t ), 

PE] = (PE t USEfV (Wj =1 CEj)) n A ( RINPi )» ™ d 
OE\ = U SE t U (Wj^jCEj)) noEi 

Sketch of proof: In principle, values for columns not required in any subsequent 
operations can be discarded. Therefore, during the process of inputs for the fixpoint 
operation, only the values for columns being referenced in the condition or output 
expressions of the fixpoint operator or in other subsequent operations need to be 
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retrieved For example, using query Ql, this rule can be applied to transform it as 
follows: 

*fl(Pfl= m X a , G: J4=*Nr, G: m4= -NT. p3=true ( P ~pl=J2 <&*el. e2, c3. e4, e5^ 

£*e2, eh e3, e4,e5 E $ ®[J2=ml,J3<m3, m2=el, m3<e3 :fl. e2, e3, e4, e5] 9 
[m2=/7, m3<J3,j2=c2,fl<e3:ml, el, e3. e4, e5] i( E > E M), (E,FJt))))) s 

T J2^ a fl=- m X\ G: ft= m NY m , G: m4= "NT, p3=true ( P>< pl=J2 <& T eL e2, e3, e4 E J> 

I x e2. el, e3, e4 E ^ ®[fl=ml.f3<m3, ml=el, m3<e3 :fl, e2, e3, e4] 9 [m2=ft, 
m3<fl,f2=&,j3<e3 :nd, el, e3, e4]& T el, e2, e3, e4 E > T fl.J2.J3,ft F > v ml, m2, 
m3, m4 M )> (*el, e2, e3, e4 E > T fl.J2,J3,ft F > T ml, ml. m3, m4 M )»» 

2. Commuting selection operations with fixpoint operations. 

This discussion will explore the heuristic of "performing selection as early as 
possible". In general, it means to move the selections inside other operators as far as 
possible. For query Ql, all of its selections can be applied to base relations E and P 
directly radio- than the final result composed by the join operation and fixpoint operation. 
The original expression is then translated into: 

T J2 (fap3=tme ^pl^fl (& a el= m X m , e4=7fr m (*el. e2, e3, e4, eS^ 
l a e4=*NY< T e2, el, e3, e4, eS 15 ® ®\J2=ml,j3<m3, w2=el, m3<e3 :fl, e2, e3, e4, 
e5], [m2=fl, m3<J3,j2=e2,J3<e3 : ml, el, e3, e4, c5] 

w-* )> nr^^M)))) 

In the above expression, the global selections (G:f4 = "NY" ,G: m4 = °NY") are 
converted to regular selections and applied to both initial and recursive inputs of die 
fixpoint operation. The purpose of this translation is to reduce the sizes of die operands 
for each operation. If the fixpoint operator 0 is viewed as a generator of a directed 
graph consisting of all paths leading to all possible answers, an early selection on the 
initial input and recursive input relations has the effect of eliminating die unqualified 
paths in the graph before they are generated. See, generally, Ioannidis, Y., and Wong, 
W., "On the Computation of the Transitive Closure of Relational Operators", Proc. of 
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12th Int. Conf. on VLDB, Tokyo, Japan, August, 1986; Jagadish, H., Agrawal, R., and 
Ness, L., "A Study of Transitive Closure as a Recursive Mechanism-, Proc. of ACM- 
SIGMOD 19871m. Corf, on Management of Data, San Francisco, California, May 1987; 
and Lu, H., "New Strategies for Computing die Transitive Closure of a Database 
Relation", Proc. of 13th. Int. Conf. on VLDB, Brighton, England, September, 1987. 

While die legality of moving global selections into the fixpoint operator will be 
apparent, the movement of die non-global selection (a^_ into the initial input needs 
some explanation. After a careful examination, it can be found that the two forms of die 
query are semantically equivalent The is because the selection (o^ -y- ) is applied to a 
direct mapping column of the recursive relation EXPSDFML. A column of a recursive 
relation is considered a direct mapping column if the output expression for that column 
consists of that column itself only. The crucial characteristic of a direct mapping column 
is that is acquires its entire set of values from the initial input rdation(s). The values it 
assumes during each subsequent recursive iteration are always taken from die value set of 
its own initial input and are not computed from values of other resources. Thus, once the 
set of values of the initial input is determined, no new values are added to the column. A 
careful look at the definition of EXPSDFML will show that new values are added to 
columns^, ...,f 5 during each recursive iteration but not to column f x . 

The concept of direct mapping column is very similar to that of invariant column 
introduced in Devanbu, P. and Agrawal, R., "Moving Selections Into Fixpoint Queries", 
Proc. of 4th Int. Corf, on Data Engineering, Los Angeles, February, 1988. Direct 
mapping columns are actually a subset of invariant columns. However, the detection of 
the more general invariant columns requires a detailed analysis of the selection predicates. 
The advantage of concentrating on direct mapping columns is that their detection is trivial 
and they cover the majority of the cases for which the payoff for performing early 
selections is substantial. 

Selections which are neither global nor applied to direct mapping columns will 
now be considered. Let Q2 be the query to find all females who might have been 
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exposed to die virus either directly or indirectly through male carrier X and such that the 
last encounter took place in New York. The query is expressed in the extended relational 
algebra as follows: 

*j£ a fl="X~,j4="M"<t v el, el, e3, et.eS^' l r e2. el, e3, e4. eS^ 
®\f2=ml,fi<m3, m2=el, m3<e3:fl, el. e3, e4, e5], lm2=fl, m3</3,J2=e2, 
J3<e3 : ml. el, e3.e4.e5] «W0. WW))) 

Distributing fee selection (c^=uy w 5 ) over fee fixpoint operator would result in: 
X J2 (^^((K^Tr^ e2, e3 9 e4, c5 E X>l a e4=*Nr 
< T e2, el, e3. e4. eS^ ®\f2=ml,J3<m3, ml=eL m3<e3 :fl, e2, c3. e4, e5] 9 
[m2=fl. m3<fi.J2=e2,J3<e3:ml, el. e3. e4, e5] (((^e4= W B >' F ^ 
«*e4=7ir E )> F M))) 

However, fee above query expression will not generate fee complete set of 
answers. This is due to fee exclusion of qualified "bridge tuples" from fee intermediate 
results used for computing fee EXPSDFML and EXPSDML relations. According to fee 
above expressions, only feme tuples satisfying (<t € 4=~ny»E ) participate in fee initial and 
recursive inputs to fee relation EXPSDFML. In order to produce fee complete set of 
answers, all qualified "bridge tuples" need to be included. It means feat all tuples which 
do not satisfy (^4= -^y- E ) must stfll be saved for subsequent computation to avoid fee 
loss of certain answers. 

The rules governing fee movement of selections into fixpoint operators feus 
depend on whether the selections are global, and if not global whether they apply to direct 
mapping columns. They are formulated as follows: 

A. Commuting selection predicates on direct m apping columns with fixpoint operations 
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Rule 2: Let SED i be the set of selection predicates on direct mapping columns for 
mutually recursive relation Rj, where SED i contains no reference to any column of 
mutually recursive relation Rj where i ^ j. Then 

°SEDj SED„ « IINP 1 BMJQCEI. OE] CE n : OE n C™*/ » 

^,XINP n ))^( ffSED] nNPj,...,a SEDn IINP n ) ® CEj : OE] CE n :OE n 

(RINPj,...,RINP n )) 

For example, as seen earlier, this rule was used to move the selection predicate 
("fl- -x- ) into input relation E for quay Ql. 

B. Commuting global selection predicates with ffrpoint operations 

Rule 3: Let SEG i be the set of all global predicates in the selection expression 
referencing attributes of mutually recursive relation Rj where SEG t contains no reference 

to any column of mutually recursive relation Rj where i * j; and let SEG t be the same 

set of predicates after the "G" tag has been removed. Then 

°SEGi . ....SEGnUWl ■ — ® CEj : OEj CE n : OE n (* flVP 7 • — 

*WP„» - {osegj.IINPj...., OsEG^nNPJ ® ^ . 0E] ^ . ^ 

(ffSECyRINPj , .... a SEGn .RINP n )) 
For example, as seen earlier, this rule was used to move the global selection predicates 
(f 4 = "ivY") and (m 4 = "NY") in quay Ql into the initial and recursive inputs of the 
fixpoint operator. 



The designation of global predicates is left to the user (see, generally, Rosenthal, 
A., Heiler, S., Dayal, U., and Manola, F., "Traversal Recursion: A Practical Approach 
to Support Recursive Applications", Proc. of ACM-SIGMOD 1986 Int. Corf, on 
Management of Data, Washington, D.C., May 1986). As was exposed earlier, the 
detection of direct mapping columns is straightforward. The detection of such columns 
may be performed when a recursive relation is defined and that information permanently 
stored in the Systran catalogs. 
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Thus far, selections wherein all the columns involved belong to a single mutually 
recursive relation have been considered. Selections of the form /J Bfj where ^ is a 
column of mutually recursive relation Rpfjisz column of mutually recursive relation Rj 
where / ^ j, and 0 is a comparison operator, will now be discussed. 

Consider query Q3 which finds all males who might have been indirectly exposed 
to the virus through make carrier X. The query is expressed as follows: 

*m2 (°/7= -X", J2=ml <& T el. el, e3.e4.e5 E 1» £ T e2. el, e3.e4,eS E 1 
® \f2=ml,J3<m3. ml=el. m3<e3 :fl, el, e3, e4. e5], [m2=j7, m3<fl,fl=e2, 
J3<e3 : ml. el, e3, e4. e5] W^O. «W0)» 

The selection f2 = ml is nothing but a join of the two recursive relations 
EXPSDFML and EXPSDML. This will be detected by the query optimizer and query Q3 
will be translated to: 

T m2^fl=-X" (*F<& T el. el, e3, e4, e5 E 1> l T el, el. e3, e4, e5 E ^> 
®\fZ=ml,J3<m3. m2=el, m3<e3 :fl. el. e3, e4, e5], [m2=fl, m3</3,J2=e2, 
fi<e3: ml. el, e3, e4. e5[ »> ~J2=ml *M <&*el,el. e3. e4. e5^ 

l T el, el, e3. e4,e5 E ^ ®\f2=ml,J3<m3, w2=el, m3<e3 :fl. el, e3. e4. c5], 
[m2=/f. m3<J3. J2=el. J3<e3 : ml. el. e3. e4. e5\ W >' »>» 

At execution time, the fixpoint operation will be performed only once, generating 
bom recursive relations EXPSDFML and EXPSDML simultaneously. A join will then be 
performed on both relations. That is, no cross products or projections of relations 
EXPSDFML and EXPSDML will actually take place to evaluate query Q3. This rule is 
formulated as: 

C. Detection of join operations 

Rule 4: Let^ Ofj be a selection where ./j- is a column of mutually recursive 
relation i? • , fj is a column of mutually recursive relation Rj where i * j, and 9 is a 
comparison operator. Then 
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VfiOfidUnPiL — >PWnlb ® rGwrfj : Ousji^AOmdn: Out n ](teci*Pi > — * 
)) 52 Tjty((0PjL ® rQwrf; ft*/ j. .... [Gwi*, : Out n } 

(Recinpj . /feri>p n ))^^ejj *j?y((PW> — . P^n D ® rCW; Cto; i 

[Qwirf/, Our n j (Recinpj , Recinp n )) 

3. Distributing join operations over fixpoint operations 

In traditional quay optimization, selections are moved inside joins to reduce the 
size of the operands of the join operations. This is usually a good strategy because 
selections always reduce the sizes of their operands. On the other hand, the result of a 
join operation may be smaller or larger ban the sizes of its operands. But the result of 
fixpoint operation is always larger than the size of its initial inputs* Therefore, a query 
optimizer should never consider moving a fixpoint operator inside a join operator, but 
should assess die value of moving a join operator inside a fixpoint operator. 

Consider again query Ql. One can safely move the join operator (without the 
selection) inside the fixpoint operator. Hie query translates to: 

r J2 ( a j7= m X m t G: fit= m NT. G: m4= "NY 9 , p3=true «P*V=e2 E !• & D 
®\f2=ml,f3<m3. m2=el, m3<e3 :fl, e2, e3, e4, e5, p2, p3\ % [m2=fl, m3<J3, 
J2=e2,J3<e3 : ml, el, e3. e4, e5\ (t p ~ , pl=e2 E > F > M )> (FJMfff) 

However, the distribution of join operators over fixpoint operators is not always as 
trivial. As an example, consider a slightly modified schema where the relation PERSON 
is replaced by the relation VACCINATED-PERSON(person, age) which contains the age 
of each vaccinated person in the population under study. The relation is depicted in the 
Table IX as: 

| VACCINATED PERSON relation (V) J 
i vl | v2 | 

| PERSON | AGE | 

I — I ~ I 

TABLE DC 
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Now, let Q4 be the query to find all vaccinated females and their age if they have 
been exposed to the virus either directly or indirectly through male carrier X. The query 
is expressed as follows: 

*J2,v2 (°fl= *X m ( V,w, vJ=J2 <& E 1> & D ®[/2=m7, fi<m3, m2=el, m3<e3 :fl, A 

A e4, e5\, [m2=fl, m3<J3.J2=AJ3<e3 : ml, el, e3, e4, e5] 

Attempting to distribute the join operator results in: 

r /2,v2 T WY»«vi=e2 E 1> E^D ®\f2=ml,fi<m3, m2=el, m3<e3:fl. A A 

c4. A vZJ, [m2=fl, m3<j3,J2=Afi<e3 : ml, el, A e4,e5] 
«V~ yl=e2 EJ?M) f <EJ?M)))) 

However, this transformation will not produce the correct result This is again 
due to the loss of qualified "bridge tuples". Although non-vaccinated females are not 
requested in quay Q4, their presence during the iterative process is essential to finding 
all vaccinated females who have been exposed indirectly to the virus carried by "X". In 
order to preserve the correct answer, the join operator has to transform to a right outer 
join operator (see, generally, Date, C, Relational Databases: Selected Writings, 
Addison-Wesley Publishing Company, 1986) as it moves inside the fixpoint operator. 
During quay execution, all tuples which are strictly the result of die outer join may be 
marked so they are eliminated from die final result 

If the join is over a direct mapping column, the join operator need not be 
converted to a right outer join operator. Consider die modified schema again and let QS 
be the quay to find all vaccinated males, their ages, and all females who have been 
exposed to than directly or indirectly. The quay is expressed as follows: 

x fl, v2,J2 <y**vl=fl ED ®\f2=ml,J3<m3, m2=el, m3<e3 :fl. A A e4, 

c5I, [m2=/7. m3<J3,j2=AJ3<e3 : ml, el, e3, e4, e5] (( £ * F ^)' (RFMW) 
The distribution of the join operator results in: 



SUBSTITUTE SHSr 



WO 92/15066 



PCT/US92/01458 



29 

*fl. v2, J2 «t y>M, v7=eJ E 1> l E D 9\f2=ml.J3<m3, ml^eh m3<e3 :fl, e2. e3, e4. 
e5, v2], [m2=fl. m3<J3,fl^e2,J3<e3 : ml. el, e3, e4. e5] 

which will produce die correct answers because the join was over a direct mapping 
column. 

For simplicity, die distribution of joins is performed over the initial inputs and 
non-recursive relations in die recursive inputs and only when the join columns are 
restricted to a single non-recursive relation in the inputs. 

Rule 5: Let UNP i and RINPj be the initial and recursive input expressions for 
mutually recursive relation J?, of a fixpoint operation. Let REGEXP be a regular 
relational expression whose output is, as usual, a single relation. Consider die 
expression: 

REGEXP~ x6fr ((IINPj,,.., IINP N ) 9 ^ . 0Ej ^ ^(RINPj , 

RlNP n )) 

where x is one of the columns of REGEXP, fj is one of the columns of mutually recursive 
relation R J9 and 6 is a comparison operator. Consider the recursive input expression 
RINPj for relation Rj. It ccmsists of the recursive relations Rj , R n , and some non- 
recursive relations NR 2 , NR m . Similarly, the initial input expression IINPj for 
relation Rj consists of relations //,...,//• If column fj takes all its initial inputs from a 
single column of a angle input relation l k and all its recursive inputs from a single 
column of a non-recursive relation NRy , thai the above expression is equivalent to: 

(IINPj, .... IlNP^j , (Ij , REGEXP*> % e j,,I k , /,), IINPj+j , ...,IlNP n ) 

®CEj : OEj CE n : OE n i NNP l ™)-J > — *EGEXP<» X 9fj . 

NR v ,... 9 NR m , R p ....Rn^RINPj+j , ...,RINP n ) 
where oo indicates a right outer join or regular join operator, depending upon whether the 

join column fj is a direct mapping column or not, and where fj' is the appropriate column 
in relation I k and fj " is the appropriate column in relation NR V . 
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4. Commuting and associating join operations 

Rule 6: The commutative and associative algebraic laws for join operations still 
hold if each fixpoint operation is treated as a whole. That is, 

A « (B ® (CJ> ))~E = AxEx(B® (QD )), and 

(A x B ) (C ® (D^)) ■ i4 <B (C ® (D^))) 
These commutative and associative prop erties allow the query optimizer to choose the 
best execution operations. 

The algebraic properties of fixpoint operators developed above give the quay 
optimizer more leeway in choosing efficient execution strategies. Hie transformation 
rules representing these p rop er t i es can be added in a fairly straightforward manner to 
most existing query optimizers. 

Most implementations of logic databases (see, for example, Morris, K., Ulman, 
J., and Gelder, A., "Design Overview of the NAIL System", Prvc 3rd Int. Coif, on 
Logic Programming, 1986, and Zaniolo, C M and Sacca, D., "Rule Rewriting Methods for 
Efficient Implementation of Horn Logic", MCC Technical Report DB-OMSJ^ Man* 
1987) do not rely on any statistical information to determine their execution strategy. 
They commonly use simple heuristics which choose to extend die predicate with die 
largest number of bound arguments. One of the most promising features of the present 
approach is dial it is targeted at existing relational quay optimizers. Thus, die 
transformation rules presented herein become a tool for the query optimizer to choose 
among a menu of execution strategies based upon the estimated execution costs associated 
with each form a query can take. 

From die foregoing it will be appreciated that the invention provides an effective 
and efficient method of evaluating linear and recursive queries in large databases. 
Existing relational techniques are integrated with the novel fixpoint operator and 
transformation procedures provided by die invention to optimize even very complex 
recursive queries. 
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Although a specific embodiment of the invention has been described and 
illustrated, the invention is not to be limited to the specific forms or arrangements of parts 
so described and illustrated, and various modifications and changes can be made without 
departing from the scope and spirit of the invention. Within the scope of the appended 
claims, therefore, the invention may be practiced otherwise than as specifically described 
and illustrated. 
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CLAIMS 

1. A method of evaluating a recursive query (19) of a computerized database 
(11), the method comprising: translating a recursive query into an expression that includes 
a fixpoint operator (13); optimizing die expression according to a set of transformation 
procedures (IS); and evaluating die optimized expression (17) by reference to data in the 
database (11). 

2. A method according to claim 1 wherein one of die procedures comprises 
commuting (25) a projection operation with a fixpoint operation. 

3. A method according to claim 1 wherein one of the procedures comprises 
commuting (27) a selection operation with a fixpoint operation. 

4. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a selection predicate on a direct mapping 
column (33) with a fixpoint operation. 

5. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a global selection predicate (35) with a fixpoint 
operation. 

6. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a selection predicate including a join (37) with 
a fixpoint operation. 

7. A method according to claim 1 wherein one of the procedures comprises 
distributing a join operation (29) over a fixpoint operation. 

8. A method according to claim 1 wherein one of the procedures comprises 
regrouping a join operation (31) and a fixpoint operation. 
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9. A method according to claim 8 wherein regrouping comprises 
commutation (39). 

10. A method according to claim 8 wherein regrouping comprises 
association (41). 

11. A method of evaluating a recursive query of a computerized database (11), 
the method comprising: in a computer, translating (13) a recursive query provided by a 
user into an expression that includes a fixpoint operator, automatically optimizing (15) the 
expression in the compute* according to a set of transformation procedures (43) stored in 
die computer, and automatically evaluating (17) the optimized expression by reference to 
data in the computer database. 
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